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The RICIS Concept 


The University of Houston-Clear Lake established the Research Institute for 
Computing and Information Systems (RICIS) in 1986 to encourage the NASA 
Johnson Space Center (JSC) and local industry to actively support research 
in the computing and information sciences. As part of this endeavor, UHCL 
proposed a partnership with JSC to jointly define and manage an integrated 
program of research in advanced data processing technology needed for JSC’s 
main missions, including administrative, engineering and science responsi- 
bilities. JSC agreed and entered into a continuing cooperative agreement 
with UHCL beginning in May 1986, to jointly plan and execute such research 
through RICIS. Additionally, under Cooperative Agreement NCC 9-16, 
computing and educational facilities are shared by the Iwq Institutions to 
conduct the research. 

The UHCL/RICIS mission is to conduct, coordinate, and disseminate research 
and professional level education in computing and information systems to 
serve the needs of the government, industry, community and academia. 
RICIS combines resources of UHCL and its gateway affiliates to research and 
develop materials, prototypes and publications on topics of mutual interest 
to its sponsors and researchers. Within UHCL, the mission is being 
implemented through interdisciplinary involvement of faculty and students 
from each of the four schools: Business and Public Administration, Educa 
tion, Human Sciences and Humanities, and Natural and Applied Sciences;" 
RICIS also collaborates with industry in a companion program. This program 
Is focused on serving the research and advanced development needs of 
industry. 

Moreover, UHCL established relationships with other universities and re- 
search organizations, having common research interests, to provide addi- 
tional sources of expertise to conduct needed research. For example, UHCL 
has entered into a special partnership with Texas A&M University to help 
oversee RICIS research and education programs, while other research 
organizations are Involved via the “gateway" concept. 

A major role of RICIS then is to find the best match of sponsors, researchers 
and research objectives to advance knowledge in the computing and informa- 
tion sciences. RICIS, working jointly with its sponsors, advises on research 
needs, recommends principals for conducting the research, provides tech- 
nical and administrative support to coordinate the research and integrates 
technical results into the goals of UHCL, NASA/JSC and Industry. 


RICIS Preface 


This research was conducted under auspices of the Research Institute for 
Computing and Information Systems by James M. Keller of the University of 
Missouri-Columbia. Dr. Terry Feagin was the initial RICIS research coordinator 
for this activity. Dr. A. Glen Houston, Director of RICIS and Assistant Professor 
of Computer Science, later assumed the research coordinator assignment. 

Funding was provided by the Information Technology Division, Information 
Systems Directorate, NASA/JSC through Cooperative Agreement NCC 9-16 between 
the NASA Johnson Space Center and the University of Houston-Clear Lake. The 
NASA technical monitor for this activity was Robert N. Lea, of the Software 
Technology Branch, Information Technology Division, Information Systems 
Directorate, NASA/JSC. 

The views and conclusions contained in this report are those of the author and 
should not be interpreted as representative of the official policies, either express or 
implied, of UHCL, RICIS, NASA or the United States Government. 
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Introduction 


For the second quarter of this research contract, we are going to report progress on 
the following four Tasks (as described in the contract): 

1 . Fuzzy set-based decision making methodologies; 

2. Feature Calculation; 

3 . Membership Calculation; 

5 . Acquisition of images. 

Since there has been a delay in acquiring images from NASA, we have devoted 
more energies to tasks 1 and 2, and performed research on task 4: Clustering for curve and 
surface fitting (which was scheduled for the third quarter). The descriptions of our 
progress are written as "stand-alone" sections, including a copy of a manuscript on Quadric 
Shell Clustering which has been submitted to the 1992 IEEE Computer Vision/ Pattern 
Recognition Conference. 

Also included is a Sun 4 tape (TAR format) including source code and images. The 
Read_Me file on the tape will describe its contents. 


Fuzzv set-based decision making methodologies 


We devoted most of our efforts this quarter to the development of the theory and 
application of methodologies for decision making under uncertainty. This section contains 
two subreports, the first on properties of general hybrid operators, while the second 
considers some new research on generalized threshold logic units. 



HYBRID AGGREGATION OPERATORS 


1.0 Introduction: 

In this report, we explore the properties of the additive y-model where the intersection part 
is first considered to be the product of the input values, and the union part is obtained by an 
extension of De Morgan's law to fuzzy sets. Then the Yager's class of union and intersection is 
used in the additive y-model. The inputs are weighted to some power that represents their 
importance and thus their contribution to the compensation process. 

2.0 Fuzzy Aggregation Connectives: 

Fuzzy aggregation connectives are useful for aggregating memberships functions. The 
resulting membership depends on the type of aggregation connective used, and this type is 
dictated by the kind of attitude that we expect from this aggregation connective. These 
connectives are very useful in decision analysis and making. Several types of fuzzy connectives 
have been used: 

2.1 The union connective: 

It is used when the aggregated value is required to be high if any one of the input values 
(xi e [0,1] ) is high. Examples are: 

The maximum operator: 

u( xi, X 2 , ..., Xjn ) = max( xj, X 2 , .... x m ) ( 1 ) 

Yager's union operator: 

m 

u( xi, X 2 , .... x m ) = min { 1, [ £ xiP] J /P }, p e [0,°°). (2) 

i = 1 

2.2 The intersection connective: 

It is used when the aggregated value is required to be high only when all the input values 
are high. Examples are: 

The minimum operator: 

i( x l> x 2> .... x m ) = min( xi, X 2 , ..., x m ) 


( 3 ) 


Yager's intersecion operator: 


m 

i( xi. x 2> x m ) = 1 - min { 1, [ £ (l-xi)P ]!/P }, p e [0,°°). (4) 

i=l 

2.3 Mean operators: 

Unlike the intersection and union operators, the mean operator does not take an extremist 
position in aggregating the input values, it rather regards the different criteria as mutually 
compensable in nature. It provides an aggregated value 
m ( x i» x 2> .... x m ) such that 

min( xi, X 2 , ..., x m ) < m( xi, X 2 , .... x m ) < max( xi, X 2 , .... x m ). 

For example the generalized mean is defined by: 

m m 

g( xi, x 2 , ..., x m ) = [ ^ wi xjP ]!/p f where £ wi = 1. (5) 

i=l i=l 

The wi are weights representing the importance of certain criteria, and p G (- OO, oo). 


2.4 Compensatory or hybrid connectives: 

In this type of connective, the high input values are allowed to compensate for the low 
ones. For example the additive and multiplicative y operators are defined as weighted arithmetic / 
geometric means of union and intersection operators respectively: 


A®yB = (l-y)(AnB)+y(AuB), 
A B = (AnBjU-T) (AuB)Y. 


It is clear that both of these operators can act as a pure intersection or union at the extremes: 
y = 0 and 1 respectively. But they allow the intersection and union to compensate for each other 
when 0 < 7 <1 . Thus y can be regarded as the parameter that controls the degree of 
compensation. 


2.5 The multiplicative y-model: 


This model was introduced by Zimmerman and Zysno and is very similar to the 
multiplicative y-operator: 


m m 

y = ( II Xi8i)(l - Y) (l - n (l - x0 Si )Y 

(8-a) 

i=l i=l 

where 


m 


X 8i = m 

(8-b) 

i=l 


and 0 <y< 1. 

(8-c) 


The e [0,1] are the inputs or criteria to be aggregated. Si represents the weight 
associated with the input xj and is related to the importance of that input, and ye [0,1] controls 


the degree of compensation between the union and intersection parts of the operator. Note that 

the intersection used in this case is the product of the inputs each weighted to some power Sj : 
m 

yi = n xi 5i ( 9 ) 

i=l 
m 

yi= n *A (10 ) 

i=l 


and the union is obtained from DeMorgan's law extended to fuzzy sets: 
m 

y2 = 1 - n (1 - x0 5i 
i=l 
m 

y2 = 1 - IT (1 - Xi)8i. 
i=l 


( 11 ) 

( 12 ) 


But nothing restricts us from using other types of intersection and union connectives. 


3.0 The Additive y-model: 

Similarly to the y-model presented in 2.4, one can define an additive y-model as: 

y = (l-y)yi+yy2, (13) 

where y i ;ind y2 are an intersection and union as given by (9) and (1 1). 


3.1 Using the product as an intersection: 

In particular, using the product for the intersection as given by (10) and (12) 
m m 

y = 0 ' Y) II xi5i + y (1 - II (1 - xi) 5i ). 
i=l i=l 


(14) 


This model has several interesting properties that we proceed to show using the following partial 


derivatives: 


dy 


= y2-yi 


= J ( j - y) yi Yd - y 2 ) 1 

° k l x k + (1 - x k ) J 


(15) 

(16) 


Property 1. The sensitivity of y with respect to x k is proportional to 5 k and is given by 

Sv, = ^ 


y 3xk 

= ^(( 1 . Y)yl + XJi k d-y 2 ) 

ylY y)yl d-x k ) 


) 


(17) 


Proof. This can be easily obtained from (16). 

We c;m see that the contribution of x k to the compensation process increases or decreases 
when the associated value of 8 k increases or decreases, because the value of the function inside 
the parentheses in (17) is always non-negative. 


Property 2. The additive y-model is a monotonically increasing function with respect to xfc. 
Proof. This follows because the right hand side of (16) is always non-negative. 

Property 3. The additive y-model is a monotonically increasing function with respect to y, and 
hence 

yi<y<y2. ( 18 ) 

m 

Proof. It can be seen that IT (1 - xi)& < (1 - xj)§j V j = l,...,m. 

i=l 

In particular, for that input x* associated with the largest weight 5max 
m 

n (1 - xi)5i < (l . x *)5max < (1 . x+) . 
i=l 

The last inequality follows since 5max £ 1 because of the constraint in (8-b). Therefore 

Y2 ^ x,,. (19 . a) 

Similarly, 

m 

II xi^i < xjSj V j = l,..,,m. In particular, 
m 

II xi^i < x 5max < x Therefore 
i=l * 

yl " x *' (19-b) 

It follows from (19) that y2 £ yl Hence using (15) ^ > 0 

3y 

Propety 4. The range of the additive y-model is as follows: 
x min ^ y ^ 1 - (1 - xmax) m > 


( 20 ) 


where 


x min = min( xi, X 2 , .... x m ) 


and 


x max = max( x lt X 2 , x m ). 


Proof. 

m m 

II Xi 5^ n xmin 51 
i=l i=l 


= x min» henee 


yi > x 


m 

min’ 


(21 -a) 


Similarly, 

m m 

FI (1 - xi) 51 > n (1 - xmax ) 51 = (1 - xmax) m , hence 

i=l i=l 

y 2 < 1 - (1 - x ma x) m - (21-b) 

Finally, the range is established as in (20) using (18) and (21). 

Therefore the range of the additive y-model is limited and does not extend to 0 and 1 unlike 
the Yager's union and intersection which can be parametrized to do so. However, if m is 
sufficiently large, the range of the additive y-model may still suffice for most applications. One 
way to enlarge the range while preserving all the properties is to loosen the constraint in (8-b), 
and replace it by the following constraint: 

There exists at least one 5i ^ 1. (21-c) 

This constraint is necessary in order to preserve property 3. In this case, the range becomes 


x min 


m 


(E 5 i) 

i=l 


(Iso 


-y - 1 - (1 - xmax) 1- 1 , 


(21-d) 


m 

which gets closer to [0,1] as ^5] increases 

i=l 


4.0 The Additive y-model with Yager's union and intersection: 

In (13), we can use Yager's intersection and union. Therefore yi is given by (4) and y2 is 
given by (2): 

yi = 1 - min {l,fi(xi, p)}, (22-a) 

where 

m 

fl(xi, p)= [ 2 (l-xi)P ]!/P, p e [0,~), (22-b) 

i=l 

and 

y2 = min (l,f 2 (xi,p)}, (23-a) 

where 

m 

f 2(xi, p) = [ j xiP ]!/P, p e [0,oo). (23-b) 

i=l 

Note that 

fl(xi,p)= f2(l-xi,p) (23-c) 

4.1 Properties of the Yager's union and intersection connectives: 

Property 1. yi is monotonically non decreasing, and y2 is monotonically non increasing with 
respect to p. 


Proof. Consider 

m 

In f2 = 1/p In [ £ xiP ] 
i=l 




since ^ 1. 

ra 

X*i p 

i=l 

Therefore f2(xi, p) is monotonicaUy non increasing with respect to p. Thus y2 is monotonically 
non increasing with respect to p. It follows from (23-c) that fi(xi, p) is monotonically non 
increasing with respect to p, and hence yi is monotonically non decreasing with respect to p. 


Property 2. 

a. lim yi = i min ( x h x 2 , .... x m ) ={*k when xi = 1 V i * k 

p— *0 10 otherwise 

b. lim yi = min(xi,x 2 , x m ) 

p— >oo 

c. Km y 2 = u m ax(xi,X2,...,x m )={fk when xi = 0 V i * k 

p— >0 U otherwise 

d. lim y2 = max( xi, x 2 , .... x m ) 

p— >oo 


Proof. 

a. Suppose there exists only one input xk * 1 while all the other inputs xj = 1 V i * k. 


Hence 


fl(xi, p)= [ (l-xk)P ]!/P = (l-x k ) < 1. 

Therefore from (22-a), yi = xk- 

On the other hand suppose that inputs xi * 1 for i = 1 ,...,s, while the rest of the inputs xi = 1 
for i = s+l,...,m. Hence 

s 

lim In fi(xi, p) = lim (1/p) In [ £ (l-xi)P ] = lim (1/p) In s = <». 

P - *0 p— >0 i=l p — >0 

It follows from (22-a) that lim yi = 0. 

p — >0 


b. Since 

ton f2(xi, p) = max(xi, X 2 , .... x m ) as will be proved in in d, it follows from 

p— ><>° 

(23-c) that 

Uni fi(xi, p) =max(l - xi, 1 -X2 , 1 -x m ) 

P— 

= 1 - min(xi, X2, .... x m ) < 1. 

Hence property 2 b. follows directly from (22-a). 


c. Suppose there exists only one input xk * 0 while all the other inputs xj = 0 V i # k. 
Hence 

f2(xi, p)= [ xkP ] ! /P = xk < 1. 

Therefore from (23-a), y2 = xk. 

On the other hand suppose that inputs x} ^ 0 for i = l,...,s, while the rest of the inputs xj = 0 
for i = s+l,...,m. Hence 

s 

lim In f 2 (xi, p) = lim (1/p) In [ £ xiP ] = lim (1/p) In s = 

P~»0 p->0 i=l p->0 

It follows from (23-a) that lim y2 = 1. 

p — >0 


d. lim ln( f 2 (xi, p) ) 

p—>oo 


= lim 

p— >oo 


= lim 

p—»oo 


m 

ln(X x i P ) 

i=l 


m 

X XiP In Xi 

i=l 

m 

2>i p 

i=l 


using L'Hopital's rule. 




max(xi,x2,...,x m ) 



= In xk- 


Therefore 

lim f 2 (xi, p) = xk = max(xi, X 2 , x m ) < 1. 

P— >03 

Hence property 2 d. follows directly from (23-a). 

Property 3. yi and y2 are monotonically non decreasing with respect to xk. 

Proof. 

?l ^ £l =x k P-l[Xx i Pjl/p-l S 0. 

i=l 

Therefore f 2 (xi, p) and hence y2 is monotonically non decreasing with respect to xk- It follows 
from (23-c) that fi(xi, p) is monotonically non increasing and therefore yi as given in (22-a) is 
monotonically non decreasing with respect to xk- 

Property 4. The range of yi and y2 is as follows 

hninUl. x 2> x m ) < yi < min(xi, X 2 , x m ) (24-a) 

max(xi, X 2 , x m )< y 2 < u m ax(xi> x 2 , x m ) (24-b) 

Proof. This property follows directly from properties 1 and 2. 

Therefore both yi and y2 can be tuned to act as an intersection and union respectively with 
the desired attitude, from the least optimistic to the most optimistic operator, by a proper 

selection of the parameter p. Property 4 also guarantees that the union is always greater than the 
intersection even when different values of p are used in yi and y2- 


Y2 ^ yi- 


(24-c) 


4.2 The additive y-model with Yager's union and intersection and weighted 
inputs. 


The output value has the form 
y = (l -y)yi +yy 2 , 
where 

yi = 1 - min {l,fi(xi5i, p)}, 
where 

m 

fl(xi r>i , p)= [ X (l-*i 5i )P j^P, p € [0,oo), 
i=l 

and 

y2 = min {1, f2(xi5i, p)}, 
where 

m 

f 2(xi 5i , p) = [ X ( x i 5i ) p l 1/p > P e [0,~). 
i=l 

Note that 

fl(xi r)i ,p)= f2d - xjSi, p ) 
where 

m 

X &i = m 

i=l 


(25) 


(26-a) 


(26-b) 


(27-a) 


(27-b) 


(27-c) 


(27-d) 


and 0 < y < 1. 


(27 -e) 



The *i e [0, 1] are the inputs or criteria to be aggregated, 8] represents the weight 
associated with the input xj and is related to the importance of that input, and y e [0,1] controls 
the degree of compensation between the union and intersection parts of the operator. 

It is easy to see that yi and y2 still satisfy all the properties shown in 4.1, if we replace the 
inputs xj by their weighted version 
aj = xj§i. 

This model has several interesting properties that we proceed to show using the following 
partial derivatives: 


dy 

z~~y2-yi 

dy 


lj = ^^ =5 k x k 5k - 1 } 


(28) 

(29) 


Property 1. The sensitivity of y with respect to xk is proportional to 8k and is given by 


Sx,, 


_ xk dy 

y 


(30) 


Proof. This can be easily obtained from (29). 

We can see that the contribution of xk to the compensation process increases or decreases 
when the associated value of 8k increases or decreases, because the value of the function inside 
the parentheses in (30) is always non-negative. 


Property 2. The additive y-model using Yager's union and intersection is a monotonically 
increasing function with respect to xk- 


Proof. This follows because the right hand side of (29) is always non-negative from property 3 
in 4.1 if xj c is replaced by ak. 


Property 3. The additive y-model using Yager’s union and intersection is a monotonic ally 
increasing function with respect to y, and hence 

yi — y — y2- (31) 

Proof. This property follows directly from (28) and (24-c). 


Propety 4. The range of the additive y-model using Yager's union and intersection is as 
follows: 

imin(xi Sl , x 2 52 , ..., x m 5ra ) < y < u max (xi 51 , x 2 52 , .... x m 5m ), (32-a) 


where 

imin(xi 51 , x 2 & 2 , .... x m 5m) _ 


.Sk 

0 otherwise 


xk u * when xi^i = 1 V i / k 


(32-b) 


umax(xi 51 , x 2 82 Xm 8m) = |xk 5k when xi 5i = 0Vi/k 

[ 1 otherwise 


(32-c) 


Proof. This property follows directly from (24-a), (24-b) and (31). 

There fore the range of the additive y-model using Yager's union and intersection is not as 
limited as the additive y-model using the product as an intersection, and does extend to 0 and 1 
depending on the choice of the parameter p. 


Property 5. In the case where the constraint on the 5i is loosened as in (21-c), all the previous 
properties still hold. In addition, the additive y-model using Yager's union and intersection is a 
monotonically non increasing function with respect to 8k- 


Proof. 

3y _ 3y 3ak 

35 k 35k 



(33) 


The expression on the right hand side is obviously negative since xi e [0,1] and the quantity 
inside the brackets is positive. 
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GENERALIZED THRESHOLD LOGIC UNITS 



1. Introduction 

Finite automata first emerged as a model of neural networks in the work of McCulloch and 
Pitts (1943) and are natural extensions of switching circuits. Switching circuits consisting of 
conventional gates (developed from boolean logic) can require many components and 
interconnections whereas another type of switching device called the threshold logic unit 
(developed from threshold logic), usually does not. There are systematic methods to synthesize 
networks consisting of threshold logic units, and these methods can be used to synthesize decision 
networks when the input and output variables are binary. This method of synthesis avoids the 
problems due to local minima in other network training schemes such as the back propagation 
algorithm. Extension of binary logic synthesis methods to multiple valued logic synthesis methods 
will enable us to synthesize decision networks when the input/output variables are not binary. This 
will be discussed in section 7. We now discuss the idea behind the threshold logic unit and its 
applications to pattern classification . 

2. Threshold logic unit 

A threshold unit (gate) consists of n two valued inputs jq, . . ,,x n and a single two valued 
output y. Its internal parameters are a threshold T and weights w \, . . ,,w n , where each weight is 
associated with a particular input variable Xj. The values of threshold T and the weights w, (t=l, . . 
.,«) may be any real, finite, positive or negative numbers. The input-output relation of a threshold 
logic element is defined as follows: 

n 

y=l if and only if ^w t x t ^T 

i=i 

n 

y = 0 if and only if '^w i x i <T 

n 

where the sum and product operations are the conventional arithmetic ones. The sum ^uq-jq is 

i=l 

called the weighted sum of the element. The symbolic representation of the threshold logic unit is 
shown in the figure below. 



3. Application to pattern recognition 

The main purpose of a pattern recognition system is to make decisions concerning class 
membership. In the two-class classifying problem, as well as in the multiclass problem, an equation 
of a surface that separates the pattern classes is of great interest. If the surface is hyperplanar, then 
we will call this equation a hyperplane equation (or decision function) </[.Y], and it can be 
represented as follows: 

d[X]=w\x\ + • • • + w„x n = T 

or d[X]=y,w ( x t = T 
1 = 1 

where X = (jq,. . .,x n )‘ is called a pattern vector and jc, is the value of the i th feature, 

Wi = i th weight, 
and T = threshold. 

As an illustration, for the two-class two-feature case, for values of X that make d[X\ 2: T, we can 
consider X belonging to one class, and for d[X] < 0, we can consider X belonging to another. The 
two pattern populations can be separated by the linear equation d[X\ = h’ijti + W 2*2 = T and is 
shown below. 



From the figure, it is clear that for values of xj and xj that make d[X] > T, we can consider 
AT belonging to and to C 2 if d[X] < T. This is in the form of the threshold unit discussed 
previously and the next step is to determine the parameters w and T. Methods for obtaining these 
parameters are discussed in the next section. 

4. Method of obtaining the hyperplane equation via threshold 
logic 

The method of obtaining the hyperplane equation that separates different classes can be 
illustrated by the following two-class three-feature problem. Suppose we have 8 patterns each with 
three features with 4 patterns classified as class 0 and 4 patterns classified as class 1. Let the 
following truth table describe the problem. 



Features 


Class 

Hyperplane equation 

*1 

*2 

*3 

output d 

vv iX[+\V 2 X 2 + w 3 x 3 = T 

0 

0 

0 

0 

0<T 

0 

0 

1 

0 

w 3 <T 

0 

1 

0 

0 

w 2 <T 

0 

1 

1 

1 

W2+W3 & T 

1 

0 

0 

0 

wj < T 

1 

0 

1 

1 

wj +W3 ^ T 

1 

1 

0 

1 

wj +W2 1 . T 

1 

1 

1 

1 

wj +W2 + W3 >T 


From the above truth table, for the output to be class 0 (d= 0 ), four inequalities yield 

o<t 

h» 3 < T 

w 2 <T 

w i<T 

and for the output to be class 1 (<i=l), four inequalities yields 
w 2 +w 3 >T 

Wi+wjzT 

>v i+h'2^7’ 

\Vi+w 2 +w 3 >T 

Combining these inequalities, we have 



and finally the condition, 

0<vvi,W2,W3<r. (*) 

As a possible selection, choosing wj=W 2 =w 3 =l and T= 1.5, 
the hyperplane equation now becomes, 

*1 + *2 + *3 = 1-5. 

It is important to note that there are an infinite number of wj's and Ts that satisfy (*). As an 
illustration, the following figure shows how the hyperplane equation separates one class of vectors 
from the other. 


}: 3 

( 0 , 0 , 1 ) ( 0 , 1 , 1 ) 



Also, a possible threshold logic unit realization for proper classification is shown as follows: 



Although for this problem, the parametersf i.e., w,'s and T ) are fairly easy to obtain by examining 
the inequalities from the truth table, other pattern classification problems that deal with many input 
features can be less trivial. For 9 inputs, 512 inequalities are needed. In general, n input features 
require 2 n inequalities. The solution to such a large set of inequalities is a challenging computation. 
Therefore, a simpler and more effective systematic method for solving for the parameters is desired. 
A possible method is presented in the following section. 

i 

5. A systematic method for obtaining the TLU's parameters 

In order to find a simpler and more effective systematic method for obtaining a linear 
hyperplane that separates input patterns into different class, properties of linear separability are 
examined. 

Given n binary valued features, there exists 2 n input patterns. If there exists a linear 
equation that separates patterns that correspond to outputs equal to zero( false nodes ) to patterns 
that correspond to outputs equal to one( true nodes ), then the patterns are linearly separable. This 
linear equation corresponds to a (n-1) dimensional hyperplane. From the n( binary valued ) 
features, a boolean function /consisting of minimal sums-of-products( MSP ) can be obtained by 
means of Quine-McCluskey tabular methods. Now, using the concept of lattices, for a hyperplane 
equation that separates the patterns to exist, it is necessary for the MSP function /to consist of each 



variable *,( x} ) to appear only in uncomplemented( complemented ) form. If so, /is said to be 
unate. Unfortunately, the property of unatness is only a necessary condition for linear separability 
and not a sufficient one. Next, in order to obtain this hyperplane, it is necessary to find the two 
different sets of patterns that are the closest to each other. This can be achieved by obtaining the 
minimal true nodes and the maximal false nodes. The minimal true nodes are the set of patterns that 
constitute to the minterms of the MSP function /. The maximal false nodes are found by 
determining all false nodes with just one feature whose value is 0, then all false nodes with two 
features whose value is 0, and so on, leaving out all nodes smaller than the ones already selected. 
To determine whether or not f is linearly separable, and if it is to find an appropriate set of 
parameters, it is necessary to determine the coefficients of the hyperplane equation. This is 
accomplished by deriving and solving a system of pq inequalitiesf i.e., all combinations of 

n n 

false( '2, w i x i) < tme( ^ViXi ) ), corresponding to the p minimal true and q maximal false nodes. If 
1=1 /=l 

ibepq inequalities can be solved, then there exists a hyperplane that separates patterns correctly. For 

the example mentioned in section 4, the MSP form f=x i*2+*2*3+*i* 3 and is unate. The minimal 

true nodes are (1,1,0),(0,1,1), and (1,0,1), and the maximal false ones are (0,0,1). (0,1,0), and 

(1,0,0). The system of inequalities yields all combinations of 

w, f w, + w 2 
W t < • w 2 + w 3 

w 3 [w l+ w 3 /**) 

and reducing the inequalities, we have 
Oovj 
0<W2 
0<W3. 

If we let wq = u> 2 = h» 3 = 1, and substititing into (**) above, we obtain 1<2 for all combinations of 
the system of inequalities. Now, since we want the threshold T to be located somewhere between 


n n 

all combinations of false( ^w,jc ( ) < true( Jwj*,-), 7=1.5 is a possible choice. This agrees with the 

i=l /=! 


results obtained from the example in section 4. 


6. Introduction to multiple valued logic 

Multivalued logic is a generalization of binary logic for an arbitrary number n of truth 
values, where n >2. The truth values (or degrees of truth) are usually chosen to be rational numbers 
between 0 and 1. The set 7„of truth values is usually defined as 


Many multivalued logics can be formulated depending on how the basic logic operations of 
disjunction, conjunction, negation, and implication are defined. For example Lukasiewicz logic 
uses the following definitions. 

a = 1 - a, a vb = max ( a,b ), a Ab = min ( a,b ), 

and a — = min (1, 1+ b - a). (1) 

When n -*o°, we obtain an infinite valued logic where the truth values are all rational numbers in 
the interval [0,1] taken from the set Too. If we insist on taking all real numbers in the interval [0,1] 
rather than those from the set Too, we can obtain an alternative infinite-valued logic, usually 
denoted by L\. This is also known as standard Lukasiewicz logic. However, these two infinite 
logics are essentially equivalent if one is one concerned with the tautologies they represent. There 
is a one-to-one correspondence (isomorphism) between set theory and logic because the set 
theoretic concepts of union, intersection, complement and inclusion correspond to the logical 
concepts of disjunction, conjunction, negation, and implication. 

Fuzzy set theory is generalization of classical set theory where the membership value of an 
element in a set can take any value in the unit interval [0,1]. Given the isomorphism between set 
theory and logic, one can view' fuzzy set theory based on max, min, and 1 -a operators for union, 
intersection and complement as an infinite-valued standard Lukasiewicz logic L\. Similarly, if we 
restrict the membership values that can occur in fuzzy set theory to the set T n , then we obtain a 


discrete fuzzy set theory which is essentially equivalent to multivalued logic. (One can make n as 
large as one wishes, depending on desired the accuracy of representation.) 

For multi-criteria decision making and information fusion methods based on fuzzy set 
theory for pattern recognition and computer vision, the aggregation takes place in a hierarchical 
network, where the type of aggregation (conjunctive, disjunctive, etc) at each node is determined 
through a learning procedure. The learning procedures we have developed are based on gradient 
descent, and use training data to adjust the parameters of the nodes so that the sum of squared 
errors between the desired output and the actual output is minimized. Two powerful aspects of 
these learning methods are that they are capable of i) eliminating uninformative and unreliable 
criteria (i. e., pruning the decision tree), and ii) generating a set of decision rules automatically 
from the training data. However, since these algorithms are based on gradient descent methods, 
they can be slow and sometimes they may converge to local minima Thus, alternative methods to 
synthesize such aggregation networks are highly desirable. One consequence of the equivalence 
between discrete fuzzy set theory and multivalued logic is that we can borrow concepts from 
multivalued logic to analyze and synthesize aggregation networks based on fuzzy set theory. 

The problem of function minimization (to eliminate redundancy) and synthesis has been 
discussed in section 5 for the binary case (e.g. Quine-McCluskey). Several techniques are based 
on the concept of lattices. Algorithms to implement the resulting binary functions in terms of 
threshold logic units also exist in the literature (e. g. McNaughton). Hampson et al have derived 
some theoretical results for the case when the inputs are multi-valued, but the output is binary. 
Although a binary output may be sufficient for some pattern recognition applications, it is not 
general. Also, the authors do not suggest any methods for the construction of such networks. 
Recently there has been some interest in function minimization methods for multi-valued logic. The 
continued work proposed herein will draw from the existing literature and analyze both the 
theoretical and practical aspects of multi-valued logic function synthesis. These results will be then 
used to construct the decision functions in terms of fuzzy logic units for pattern recognition and 
computer vision. The advantages of the proposed methods are that 



i) they are highly amenable to theoretical analysis, 

ii) redundancy detection is handled naturally in the function minimization process, and 

iii) the resulting network is always guaranteed to be globally optimal for the training data. 

In fact, in the case of pattern recognition problems, the classification error on the training data will 
be zero. 
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Feature Calculation 


Since we have experienced a delay in obtaining imagery from NASA, we have 
postponed much of the work on this task until that time when the data becomes available. 
As mentioned in the first quarter report, we have available numerous algorithms which we 
routinely use to generate features from digital images. These are ready, and will be run on 
the data when it arrives. The section on Acquisition of Images will detail our progress on 
generating our own "simulation" imagery. 


Calculation of Membership Functions 


Our work in this area has progressed nicely. We have designed and implemented 
numerous algorithms to generate membership values from a set of training data using 
histograms, results of fuzzy clustering, and heuristic definitions. We have also made 
progress in the transformation of "probability density functions" into possibility 
distributions for use in assigning membership values to individual points. Since this task 
overlaps into the third quarter, we are going to postpone the complete write-up until the 
third quarter report. Hopefully, at that time, we will be able to supply a preprint of a paper 
describing our new results. We feel that that approach is the most profitable, since the 
paper will contain a concise statement and solution to the problem. 


Acquisition of Images 

As mentioned in the beginning of this report, we are waiting to receive 
imagery from NASA on which to test our algorithms. In the meantime, we have built a 
scale model of the shuttle, and built a mechanism to position this model at known 
orientations relative to the camera. We have begun to digitize images of this model to test 
some of the algorithms while we are waiting for the NASA pictures. We are including a 
few of these images both in hard copy form and on the accompanying tape. 














Clustering for Curve and Surface Fitting 


The best way to describe the new work in this task is to include a copy of a 
manuscript recently submitted by Dr. Krishnapuram and two of the graduate students 
supported by this contract to the 1992 IEEE Computer Vision/ Pattern Recognition 
Conference. 

The tide of the paper is: 

“Quadric Shell Clustering Algorithms and Their Applications”. 

This represents the extension of the previously reported work to clustering edge 
data into general quadratic curves. We are also extending this approach to 3-Dimensional 
data sets( ie, surfaces). 


Quadric Shell Clustering Algorithms and Their Applications 

Raghu Krishnapuram, Hichem Frigui, and Olfa Nasraoui 
Department of Electrical and Computer Engineering 
University of Missouri, Columbia, MO 65211 

ABSTRACT 

In this paper, we introduce new hard and fuzzy clustering algorithms called the C Spherical 
Shells (CSS) algorithms and the C Quadric Shells (CQS) algorithms. The C Spherical Shells 
algorithms are specially designed to search for clusters that can be described by circular arcs, or 
more generally by shells of hyperspheres. The C Quadric Shells algorithms are expressly designed 
to seek clusters that can be described by segments of second-degree curves, or more generally by 
segments of shells of hyperquadrics. Most previous clustering algorithms assume that the clusters 
are “filled”, i. e., they are not hollow. Such algorithms cannot cluster data that lie on shell-like 
subspaces of the feature space. The advantage of our CQS algorithms lies in the fact that they can 
be used to cluster mixtures of all types of hyperquadrics such as hyperspheres, hyperellipsoids, 
hyperparaboloids, hyperhyperboloids, and hypercylinders. We also introduce cluster validity 
measures for shell-like clusters, and show that the validity measure can be used to determine the 
number of clusters when this is not known. Several examples of clustering in the two-dimensional 
case are shown and their applications are suggested. These algorithms can easily outperform the 
traditional algorithms based on the Hough transform for boundary detection. 
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Summary 


1. This paper is about partitioning ^-dimensional data points which are assumed to lie on (possibly 
an unknown number of) hyperquadric surfaces into meaningful clusters. In other words, the 
clusters we deal with are described by shell-like subspaces of the original feature space. 

2. We introduce new hard and fuzzy clustering algorithms called the C Spherical Shells (CSS) 
algorithms the C Quadric Shells (CQS) algorithms. The CSS algorithms search for clusters that can 
be described by circular arcs, or more generally by shells of hyperspheres. The CQS algorithms 
seek clusters that can be described by segments of second-degree curves, or more generally by 
segments of shells of hyperquadrics. We also introduce cluster validity measures for shell-like 
clusters, which can be used to determine the number of clusters when this is not known. 

3. Most objective-function-based clustering algorithms in the literature consider only “filled” 
clusters, and hence they cannot be used when the clusters are hollow. The few shell clustering 
algorithms that have considered hollow clusters work only for clusters of specific shapes such as 
circles or ellipses in 2-D. The few existing algorithms are also implementationally complex since 
they require solving coupled nonlinear equations for the shell parameters. They do not perform 
well on partial shells either. Our algorithms do not involve nonlinear equations, i. e., they have 
solutions in closed-form. Thus, they are much faster. Moreover, they can be used to cluster 
mixtures of all types of hyperquadric shells such as hyperspheres, hyperellipsoids, 
hyperparaboloids, hyperhyperboloids, and hypercylinders. Our algorithms can easily outperform 
the traditional algorithms based on the Hough transform in boundary detection and other 
applications. 

4. The proposed algorithms can be used for various applications such as boundary detection and 
pattern classification. They can also be used for surface fitting and description in 3-D (i. e., with 
range data), 'rhese algorithms can potentially lead to a more general class of algorithms that deal 
with shells of more complex types. 



1. Introduction 


Many clustering algorithms have been suggested and used in the literature to partition data 
into clusters. Clustering algorithms can be categorized into two classes, depending on whether a 
feature point belongs to just one cluster or to all C clusters, albeit to different degrees. These two 
classes are known as hard (crisp) and fuzzy algorithms respectively. There is an entire class of 
clustering algorithms in which an objective function based on a distance measure is iteratively 
minimized to obtain the final partition [1,2]. The distance measure chosen and the objective 
function being optimized depend on the geometric structure of the clusters. For example, the K- 
means algorithm, using the Euclidian distance, looks for hyperspherical clusters [3], The 
Gustafson-Kessel algorithm uses a weighted Mahalanobis distance, and can detect hyperellipsoidal 
and hyperplanar clusters [1]. 

Until recently, it has been difficult to detect clusters that can be described by shell-like 
subspaces i.e., clusters that are not “filled” but are hollow. Dave's [4,5] Fuzzy C Shells (FCS) 
algorithm has proven to be successful in detecting clusters that can be described by circular arcs, or 
more generally by shells of hyperspheres. Several impressive examples involving two-dimensional 
data sets are given in [4,5]. Dave et al have also generalized this algorithm to the case of ellipsoidal 
shells [6,7] and this algorithm is known as the Fuzzy Adaptive C-Shells (FACS) algorithm. 
However the FCS and the FACS algorithms are implementationally complex since they involve the 
use of Newton's method to solve coupled nonlinear equations for the shell (prototype) parameters. 
Moreover, the performance of the FACS algorithm is not good for partial curves. A modification to 
the FCS algorithm has been suggested by Bezdek et al to reduce the computational burden [8]. 

In this paper, we propose new C Spherical Shells (CSS) algorithms that do not involve 
coupled nonlinear equations, i. e., they have solutions in closed-form. This makes our algorithm 
straightforward, and more importantly, computationally more attractive. In addition, we present a 
new set of hard and fuzzy clustering algorithms that generalize the Fuzzy C Shells and the 
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Adaptive Fuzzy C Shells algorithms. We call these algorithms the C Quadric Shells algorithms. 
They use an objective function based on a new distance measure and they seek clusters which can 
be described by segments of second degree-curves, or more generally by segments of shells of 
hyperquadrics. We also propose algorithms to determine the optimum number of clusters C, when 
this is not known. These algorithms involve minimizing a validity (performance) measure called 
the shell thickness. One major advantage of our CQS algorithms is that they are able to partition a 
composite mixture of different types of hyperquadric shells, whereas previous cluster-based and 
non-cluster-based algorithms apply to a specific type of hyperquadric. For example, Hough 
techniques to find analytical curves can be implemented efficiently only for specific types of curves 
[9]. This aspect is discussed further in Section 6. The other advantages of our approach are that 
they are computationally and implementationally simple, the memory and CPU time requirements 

are reasonable, and they do not require the a priori knowledge of the number of clusters present in 
the input data set 

Section 2 presents the hard and fuzzy versions of our C Spherical Shells algorithm. Section 
3 introduces the hard and fuzzy versions of our C Quadric Shells (CQS) algorithm. Section 4 
describes cluster validity measures and unsupervised algorithms which can be used to determine 
the optimum number of clusters when this is not known a priori. In Section 5 several examples of 
clustering using the proposed algorithms are presented and applications in computer vision and 
pattern recognition are suggested. Finally, section 6 gives the summary and conclusions. 

2. The C Spherical Shells (CSS) Algorithms 

In the case of C Spherical Shells algorithms, the assumption is that each cluster resembles a 
hyperspherical shell, or part thereof. Let xj be a point in the feature space. The prototypes X- 

consist of two parameters (c ; -, r z ), where c, is the center of the hypersphere and r ( is the radius. We 
define the distance from x - to a prototype A t - = (c t -, r ) as 

d Si? = d S 2 (Xj^) = (II Xj - c i II 2 - r ( 2 ) 2 . ( 1 ) 
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The subscript S in the above equation stands for “spherical”. Note that the right hand side of (1), 
when equated to zero, also gives the equation of the hypersphere. In general, the closer xj is to the 
specific hypersphere, the smaller the distance will be. Based on this distance measure, we now 
define the hard and fuzzy C Spherical Shells algorithms. 


2.1 The Hard C Spherical Shells (HCSS) Algorithm 


We define the objective function to be minimized in this case, as 

j s (L) - 1 z 4 , ■ 

1=1 xjeXi oij 


( 2 ) 


where L — (Aj,...,^), and K is the number of clusters. In order to minimize the objective function 
in (2), we rewrite the distance in (1) as 

4ij ''-Pi M j Pi + v ] Pi + v 


where 


b. = (x T 2 
J K j f ' 


Mj = y.y., and 


vj= 2 (xJ Xj )y r y r [ X {]. 


Pi = 


" 2 C i 

T 2 

c r r i 


(3) 


Therefore, 


A 

/s(I) = E E (pj M.p. + yj p +b-). 
i=l xjeXi 1 J 1 J 1 J 


(4) 


- We raa y assume that the vectors p. are independent of each other. Hence, the vectors p. that 
minimize (4) must satisfy 

r j ?4 2 'V' + ‘' ) = 0 - (5) 


If we define 

H. = E M ; , and w. = E v. 

1 xje Xi J i Xj z X i J 

from (5) we obtain 


( 6 ) 
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( 7 ) 


1 „ -i 

p. = - ~ // w. 

2 i i 

The resulting Hard C- Shells (HCS) algorithm is summarized below. 


THE HARD C SPHERICAL SHELLS (HCSS) ALGORITHM: 
Fix the number of clusters K ; 

Set iteration counter / = 1 and initialize the hard /sf-partition; 
Repeat 

Calculate Hy ^ and for each cluster using (6); 
Compute py ^ for each cluster using (7); 

Classify x- into cluster A* if df. < d }; , for all k * i; 

J IJ KJ 

Increment /; 

Until ( WpQ' 1 > - /)||< £ ); 


2.2 The Fuzzy C Spherical Shells (FCSS) Algorithm 
For the fuzzy case, we minimize the following objective function: 

J S (L,U) = .£ (Myrdlj. ( 8 ) 

i 

In (8) N is the total number of feature vectors, and U = [ji - ] is a K x N matrix called the fuzzy K- 

partition matrix [1] satisfying the following conditions: 

K n 

/i l7 e [0,1] for all i and j, . Z u.. = 1 for all j, and 0 < Z p.. <N for all i. 

J i=i y j = i i j 

is the grade of membership of the feature point x ; in cluster A,-, and m e [l,oo) i s a weighting 
" exponent called the fuzzifier. As in the hard case, it is easy to show that the vectors p. that 
, minimize (8) are given by (7), where 

H i - j - j ov"-, • 

and Vj and M. are given by (3). Using a proof similar to that of Bezdek’s for the fuzzy C-means 
algorithm [1], it can be shown that the memberships will be updated according to 
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ifh = <z> 



Vik 


0 1 g 4 

1 1 < e 4 


<7 4*0 
<7 4*0 


( 10 ) 


where I , - 


\ = [i\l< i< K, d Sik = 0}. The resulting Fuzzy C Shells (FCSS) algorithm is 


summarized below. 

THE FUZZY C SPHERICAL SHELLS (FCSS) ALGORITHM: 

Fix the number of clusters K; fix m , 1 < m < <*>; 

Set iteration counter / = 1 ; 

Initialize the fuzzy AT-partition l/°); 

Repeat 

Calculate hV 1 and 1 for each cluster A; using (9); 

Compute 1 for each cluster A, using (7); 

Update using (10); 

Increment / ; 

Until (II I/O' 1 )- < e); 

Both the hard and fuzzy C-shells algorithms require the inversion of the matrix H t . This is quite 
trivial when the feature space is two-dimensional or three-dimensional. In the hard case, the 
inverse will exist if there are at least n+1 non-collinear points in each cluster, where n is the 
dimensionality of the feature space. In the fuzzy case, theoretically the inverse will always exist as 
long as N > n+1 and the feature vectors are not collinear. 

3. The C Quadric Shells (CQS) Algorithms 

The C Spherical Shells algorithms can be generalized to include shells of (hyper)quadric 
surfaces, rather than just (hyper)spherical shells. We first present the two-dimensional case 
because it is easier to formulate. We then generalize the algorithm to the ^-dimensional case. 
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Let xj = [.tyj, xj 2 ] be a point in the 2-D feature space. In the two-dimensional case, we 
assume that each cluster resembles a second-degree curve. Therefore, the prototypes /?,- consist of 
six parameters [a ih a/ 2 a/ 6 ] which define the equation of the curve. We define the distance 

from Xj to a prototype /?, as: 

dqij - dQ^Xj,^) = (flu Xj l + a/2 Xjj + V2 a/3 x j\Xji + a/4 xji + a j5 xy 2 + a /g) 2 . (11) 

The subscript Q in the above equation stands for “quadric”. The right hand side of (1 1), when set 
to zero, also represents the equation of the second-degree curve which the prototype represents. 
The coefficient of the Xj\Xj 2 term in (1 1) is assumed to be V2 a,- 3 without loss of generality. This 

results in a simpler notation for a constraint that we will used later. Based on this distance measure, 
we now define the hard and fuzzy CQS algorithms. 


3.1. The Hard C Quadric Shells Algorithm 


In the hard case, we define the objective function to be minimized as 

**> 4 4 ,. 4 - 


(12) 


where L - ( P lt ...,P K ), and K is the number of clusters. In order to minimize the objective 
function in (1 2), we rewrite the distance in (1 1) as 

" d Q 2 (*y’A-) = C*J A i x j + X J v/ + bi )2 

H 3/^2' 
a i2 J 


where A,- 


_ " <Hi a/3/^2 1 fa/4-1 

'-[aoHi a /2 } V, ' = Ui5J ,and bi = a *- 


(13) 

(14) 


Equation (13) can be rewritten as 


(15-a) 


where the/), represent the parameters of the prototypes of the clusters, and are given by 
T T T T 

p i = tp il {p i2^ p i\ = [«/i, a a , a/ 3 ], and = [a/ 4 , a /5 , a /6 ]. 

The My in (15-a) are given by 


(15-b) 
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(15-c) 


where 

Qj = tj «j, with qj - [x] v x*,, V2*jixjj], 
Sj = Sjsj, with jT = [x yl , x /2 , 1], and 

Using (15) and (16), (12) can be written as 

^ 4 ■ 


(16-a) 

(16-b) 

(16-c) 


(17) 


Jq(L) is homogeneous with respect top/. Therefore, we need to constrain the problem in order to 

avoid the trivial solution. Some of the possibilities are: 

(i) llp/ll 2 =l 

(ii) bi~ 1, and 

(iii) TiCA/AT) = llp /1 l|2= 1. 

Constraint (n) assumes that the curve does not pass through the origin. The last constraint has the 
advantage that the resulting distance measure is invariant to rigid transformations of the prototype 
[10]. However, if one is simply interested in clustering the data and not interested in obtaining 

invariant parameters of the clusters, one may use constraint (i). We found that constraint (iii) 
works best in practice. 


While minimizing (17), we may assume that the vectors p t are independent of each other. 
Hence, the objective is to minimize 

7 Q(A ) = 2^ pj Mj pi subject to llp/jll 2 =1. (lg) 

If we define 


F ‘=*h Q ’- 




(19) 
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Wi= X Mj 

Xj^Pl J 


Fi g; 

L Gi Hi] 


, then 


using a Lagrange multiplier we can recast (18) as 

jQ(Pb k) =pJw iPi - \ (llpnll 2 - !)• 

Setting the gradient of /q(/?,-, X) with respect to p, equal to zero yields 


Wi pi = Xp ih and 

Tlr. 1 r , 

j^ii j = jJPnl 


Fi g; 

L Gi Hi] 


Equation (21) can be solved for p,j and pq. The solution is given by 

Pil = eigenvector of (Fj- - G- H^G } ) associated with the smallest eigenvalue. 


r-l 


Pil = - H] G t pn 

The resulting hard CQS algorithm is summarized below. 


THE HARD C QUADRIC SHELLS (HCQS) ALGORITHM: 

Fix the number of clusters K; 

Set iteration counter / = 1; 

Apply the HCSS algorithm until it converges to initialize the hard F-partition; 
Repeat 

Calculate f9\ G^and using (19) and (16); 

Compute p® for each cluster using (22); 

Classify xj into cluster # if df.< d 2 kj for all * i ; 

Increment /; 

Until ( II p<J- 1)- p(0|| < e ). 


Note that the HCSS algorithm has to be applied in order to get a good initial AT-partition. 
the performance of the HCQS algorithm is poor. 


( 20 ) 

( 21 ) 

(22-a) 

(22-b) 


Otherwise 


8 




3.2. The Fuzzy C Quadric Shells algorithm 


For die fuzzy case, we minimize the objective function 

Jq(L,U) = . £ jZtMijr 4,J . (23) 

where N is the total number of feature vectors and U = [ ] is the K x N fuzzy /T-partition 

matrix, as described in Section 2.2. As in the hard case, it is easy to show that the vectors pi that 

minimize (23) subject to the constraint Wpn II 2 = 1 are given by (22) where 
N 

Fi- Qj . 

N N 

G i = j Rj . Hi =. 2 Oty)* Sy . (24) 

and Qj, Rj and Sy are as in (16). 


Minimization with respect to the jU,y can be done as before, if we follow Bezdek's theorem 
for the fuzzy C-means [1]. It can be shown that the memberships will be updated according to 



K 

y 

i 

n 

S3 

flat = 1 

j-i 

\ d Qjkj 



0 

i* h 

ifh*0 


. i 

ie I k 

ifh*0 

{in<i<K,cf Qik = o}. 



(25) 


The resulting FCQS algorithm is summarized at the end of section 3.3. 


3.3. The n-dimensional case 


Both the hard and the fuzzy C Quadric Shell algorithms can be extended to the n- 

dimensional case very easily. In this case, the distance measure is still in the form given by (13), 
where A,- is an nxn symmetric matrix given by 
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(26-a) 


a i\ a i(n+ 1/^2 

a i(n+ 1/>^ a i2 

a i(2n-\)!^- 

- a i(2/i-l/^2 

a if J<2 

• fly, _ 


and Vi is an «xl vector given by 

~ a i(r+\) 


Vi = 


L«j(r+«)J 


n(n + 1 ^ 

and bi = a^r+n+iy where r = -Vg ; 


(26-b) 


This distance measure can again be written as (15-a), if the p, are given by 
T T T T T 

Pi ~ [Pji ^ ;1 “ t a il» a i2» • • •» ®i>]» an( ^ P /2 = t a j(r+l)» • • •» ^j(r+n+l)]< 


Similarly (16) needs to be modified as: 

Qj = 9j q], where 
T* 2 2 2 

Qj ~ [ x j\> X j2 7 ‘ ' ’’ *jn* ^ x jl x j2’ ■ * ’'~^~ x jk x jb • - •>V2xy^.j)jCy /I ], and (27-a) 

5 y = sy sj , where sj = [ Xjl , x j2 , .... x jn , 1]. (27-b) 


The Hard CQS algorithm remains the same if these new definitions are used. The fuzzy CQS 
algorithm is summarized below. 

THE FUZZY C QUADRIC SHELLS (FCQS) ALGORITHM: 

Fix the number of clusters K; fix m,l<m< °o; 

Set iteration counter / = 1; 

Initialize the fuzzy ^-partition l/(°) using the FCSS algorithm; 

Repeat 

Calculate F®, G^and for each cluster A using (24) and (27); 

Compute p /0 for each cluster A using (22); 

Update UV ) using (25); 

Increment l ; 

Until (HEX 7 - 1 ) - 1/(0 II < e ) ; 
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The FCQS algorithm is somewhat sensitive to the initialization process. Hence the FGSS algorithm 
needs to be applied to obtain a reasonable initial fuzzy A'-partition (X°). 


3.4 The Modified C Quadric Shells Algorithm 


The distance ^Q 2 (ft,ft) defined by (13) is highly nonlinear in nature. It is easy to show 
through simple examples that this distance is sensitive to the placement of Xj with respect to the 
prototype ft,-. This does not cause problems if the clusters are well-defined quadric shells. 

However, if the clusters are ill-defined or if there is a lot of noise, the resulting estimates of the 
parameters of ft,- and the memberships fty can be significantly influenced by outliers. To alleviate 
this problem, one may use the shortest (perpendicular) distance (denoted by dp ) between the 

point Xj and the shell ft,-. We now describe how this is achieved. 


Let xj be a point in the feature space. The distance measure is the minimum distance 
from the point Xj to the quadric curve ft, describing the cluster. Finding the d?.. can be formulated 


as: 



min llxy - zll 2 such that (z T /l/z + z T v,- + ft) = 0 


(28) 


where z is a point lying on the quadric curve describing cluster ft . Using a Lagrange multiplier A, 
(28) reduces to minimizing (II Xj - zll 2 - A (z T A iZ + z T v { - + ft)) with respect to z and A. This yields 
2 (.Xj - z) + A ( 2/1,- z + v,- ) = 0, and (29) 

z T A,z + z T v,- + ft = 0 pQ) 


Equation (29) can be solved for z as 
z = ^ (I* A A,)* 1 (Av,-+2 xj). 

Substituting (31) in (30) yields an equation in A which is a quartic (fourth-degree) equation in the 
2-D case, and has at most four real roots A*, 1 < k < 4. For higher dimensions the equation is of 
6th degree or higher. Solving for the four roots in the 2-D case is quite straightforward if one uses 
the standard solution [1 1], The resulting expressions are rather long and cumbersome, involving 



nested square roots, and hence they are not presented here. For each real root A* so computed, we 
calculate the corresponding zk using (31). Then, we compute d£.. using 

4// == m * in llx j - ^ l|2 (32) 

One can formulate the FCQS algorithm using as the underlying distance measure. In this case. 


the objective function to be minimized becomes 



(33) 


Minimizing this function with respect to U yields 


hr 

1 

if /* = 0 

V 



A—i 

7 = 1 

l“P jkl 


0 

i I k 


,1 

*6 Ik 

if l k *0 


(34) 


However, minimizing (33) with respect to the parameters pi results in coupled nonlinear equations 
with no closed-form solution. To overcome this problem, we may assume that we can obtain 


approximately the same solution by using (22), which will be true if all the feature points lie very 
close to the hyperquadric shells. This assumption leads to the following modified FCQS algorithm. 

THE MODIFIED FUZZY C-QUADRIC SHELLS (MFCQS) ALGORITHM: 

Fix the number of clusters K; fix m , 1 < m < 

Set iteration counter / = 1; 

Initialize the fuzzy ^-partition lK°) using the FCSS algorithm; 

Repeat 

Calculate fj l \ G^and H^for each cluster # using (24) and (16); 

Compute pfl ) for each cluster Pi using (22); 
compute using (30), (31) and (32) 

Update UV ) using (34); 

Increment / ; 

Until ( II UV-V - 1/(0 || < e ); 




It is to be noted that this modified algorithm is easy to implement only in the 2-D case. In higher 
dimensions, solving for d£.j is not trivial. In practice, we found that in the 2-D case the modified 

FCQS algorilhm converges much faster than the original version. This may be attributed to the fact 
that the membership assignment based on the perpendicular distance is more reasonable. 

4. Determination of the Optimum Number of Clusters 


The algorithms discussed in Sections 2 and 3 assume that the number of clusters K is 
known. This is indeed the case in many pattern recognition applications and some computer vision 
applications. When the number of clusters is unknown, one method to determine the optimal 
number of clusters is to perform clustering for a range of K values, and pick the K value for 
which a suitable performance measure is minimized (or maximized). For the fuzzy CQS 
algorithm, we define a new performance (or cluster validity) measure called the total fuzzy shell 
thickness as follows. 

T Q (K) = i ?i j ?i V 4 ij > (35) 


which is also the objective function in (33). Tq(K) will be small if all points lie close to one of the 


K quadric shells. In the hard case, the total shell thickness measure becomes 

T (K) =.£ £ dl... 

Q i= IxjG ^ P ij 


(36) 


In the special case where all the quadric curves representing the clusters are (hyper)spheres, 
another validity measure called the total fuzzy average shell thickness for spherical shells may be 
defined as 


K 

Ts(K) = £ 
1 = 1 



pJOUj-till -rtf 


N 


1 W 

j - 1 


(37) 


In the hard case, this becomes 
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T S (K) = 


(38) 


. £ jr- 2 (Mat - - c- 
i=l N. X yg Xi J 1 

where N i is the number of points in cluster A,- . 




To find the optimum number of clusters when the FCQS algorithm is used, one can start 
with K = 1, and keep incrementing K while calculating T^(K) after each run of the FCQS 

algorithm, and stop as soon as a knee point or a local minimum in the curve of Tq(K) is found (or 
K reaches K rnax ). This unsupervised algorithm is summarized below. 


THE UNSUPERVISED FUZZY C QUADRIC SHELLS ALGORITHM: 

Set K = 1; fix m , 1 < m < °°; 

local jnin _or_knee _point = false; 

While K <= Kmax and local min _or_knee _point = false do 

Perform the FCQS algorithm with the number of clusters = K; 
Calculate Tq(A') as given by (35); 

If T q (KA) is a local minimum or a knee point Then 

local _min _or_knee _point = true; 

Koprimal = K- 1 ; 

Else 

K = K+ 1; 

End If 
End While 


Similar unsupervised algorithms can be designed with the HCSS, FCSS, and HCQS algorithms. 

5. Experimental Results 


In this section, we illustrate the algorithms presented in Sections 2, 3, and 4 through 
several examples. We present only results of two-dimensional data sets in this paper, even though 
the algorithms presented are applicable to feature spaces of any dimension. In general, we found 
that the hard CSS and CQS algorithms are about an order of magnitude faster than their fuzzy 
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counterparts, but they perform well only when the clusters are not highly entangled. Thus, the hard 
versions are not very robust, and hence we do not present their results in this paper. 

We first show the results of the FCSS algorithm. In these examples, the number of clusters 
is assumed to be known, although an unsupervised algorithm to find the optimum number of 
clusters can easily be devised using the performance measure in (37). In all the examples shown, 
the FCSS algorithm was applied with the fuzzifier m = 5. Smaller values did not yield good 
results. This may be because we initialize the fuzzy partition matrix U with the fuzzy C means 
algorithm [1] which does not yield a good partition of the clusters, particularly in the case of 
overlapping or concentric circles. By making the partitioning as fuzzy as possible, it is possible to 
disentangle the intertwined clusters from each other using the FCSS algorithm. The data sets in 
images of size 200x200 shown in Figure 1 were artificially generated, and had between 50 and 
200 feature points. Uniformly distributed noise with an interval of 3 was added to the feature point 
locations so that they do not always lie exactly on the ideal circles. In addition, noise points were 
added at random locations to some of the data sets. 

Figure 1(a) shows the result of clustering two semicircles contaminated by noise. This 
example shows that the algorithm is successful even when only parts of circles are present. The 
second example in Figure 1(b) consists of two concentric circles contaminated by a few noise 
points. This is an example where conventional clustering methods fail miserably. As seen in Figure 
1(b), the two concentric circles are correctly classified, and the noise points are assigned to the 
closest cluster. Figure 1(c) shows the clustering of five sparsely sampled overlapping circles. This 
is a very difficult case, because the circles are truly entangled, and the initial partition is quite 

wrong. The CPU time required on a Sun 4 workstation to run the FCSS algorithm is typically on 
the order of Is. 


We next present the results of the unsupervised MFCQS algorithm. In all the examples 
shown in this paper, the initial fuzzy /^-partition was obtained as follows. The Fuzzy C-Means 



algorithm was first applied with the fuzzifier tn — 2 for five iterations. The resulting fuzzv partition 
U was quite poor at this point. Therefore, this was followed by the application of the FCSS 
algorithm with tn — 2. After the FCSS algorithm converged, the MFCQS algorithm was applied 
with m-2. 



(c) 


Figure 1: Examples of clustering using the Fuzzy C Spherical Shells algorithm, (a) two semi 
circles with noise, (b) two concentric circles, and (c) five entangled sparsely sampled circles. 
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(C) (d) 

Figure 2: Examples of clustering using the modified fuzzy C Quadric Shells algorithm, (a) three 
overlapping circles, (b) an ellipse enclosed by two overlapping circles, (c) three parabolas with 
different orientations, and (d) a mixture of three types of quadrics: a circle, an ellipse and a 
parabola. 


Figure 2 shows some examples that are typical of boundary detection problems. The data 
sets in images of size 200x200 were artificially generated, and had between 50 and 200 points. 







Uniformly distributed noise with an interval of 3 to 5 was added to the feature point locations so 
that they do not always lie on ideal quadric curves. Example 1 consists of three overlapping circles. 
Example 2 shows two overlapping circles enclosing another ellipse. Example 3 shows three 
differently oriented parabolas, and Example 4 shows three different types of quadric curves: a 
parabola, a circle and an ellipse, all in the same data set. In all cases the clusters criss-cross one 
another, and conventional methods cannot separate them. The plot of the total fuzzy shell thickness 
vs the number of clusters for these four examples is shown in Figure 3. In every case, the knee 
point or the local minimum is clearly defined, and picking the optimum K is quite simple. The 
MFCQS algorithm clusters all these data sets successfully, and the results are excellent 



Figure 3: The plot of the total fuzzy thickness vs the number of clusters. 





(c) (d) 

Figure 4: Examples of unsupervised modified fuzzy quadric shell clustering, (a) two (partial) 

circular clusters, (b) two elliptical clusters, (c) a parabolic cluster and an elliptic cluster, and (d) 
two crossing elliptical clusters. 

Figure 4 shows several situations that are more typical of pattern recognition problems. 
These data sets in images of size 200x200 were also artificially generated, and each example has 
between 200 and 350 points. Uniformly distributed noise with an interval of 10 to 15 was added to 
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the feature point locations so that they do not lie on ideal quadric curves. Figure 4(a) (Example 5) 
shows the result of the unsupervised MFCQS clustering of two circular clusters with different 
radii. Fig. 4(b) (Example 6) shows the result of the unsupervised MFCQS clustering of two semi- 
elliptical clusters with very different major and minor axes. Fig. 4(c) (Example 7) shows the result 
obtained when the unsupervised MFCQS algorithm was used on a parabolic cluster and an elliptic 
cluster. Finally, Fig. 4(d) (Example 8) shows the unsupervised MFCQS clustering of two crossing 
elliptic clusters. These examples show that the MFCQS algorithm is effective even when the 
clusters are very different in size and when only partial curves are present. The plot of the total 
fuzzy shell thickness against the number of clusters is shown in Figure 5. Here again, it is vary 
easy to pick the knee point and the optimum number of clusters. 



In all the examples shown, the MFCQS algorithm performed successfully, yielding the 
correct final partition of the data set. It performed well in the presence of quadric clusters of the 


20 




same type or of different types and sizes. It also performed well with clusters that represent partial 
quadric curves of the same type or different types and sizes. The MFCQS algorithm typically 
converged in less than 20 iterations. The CPU time required on a Sun 4 workstation to run the 
MFCQS algorithm was typically on the order of 15s. This is very reasonable considering the 
complexity of the problems. 

6. Conclusions 

In this paper, we introduced new hard and fuzzy clustering algorithms called the C 
Spherical Shells (CSS) and C Quadric Shells (CQS) algorithms. These algorithms are specifically 
designed to seek clusters that can be described by segments of second-degree curves, or more 
generally by segments of shells of hyperquadrics. These algorithms can potentially lead to a more 
general class of algorithms that deal with shells of more complex types. Most objective-function- 
based clustering algorithms in the literature consider only filled clusters, and hence they cannot be 
used when the clusters are hollow. The few shell clustering algorithms that have considered hollow 
clusters work only for clusters of specific shapes such as circles or ellipses. The CSS algorithms 
are excellent for the detection of circular boundaries or clusters. The advantage of our CQS 
algorithms lies in the fact that they can be used to cluster mixtures of all types of hyperquadric 
shells such as hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids, and 
hypercylinders. The examples shown in Section 5 of clustering in the two- dim ensional case 
illustrate the superior performance of the proposed algorithms. The hard versions do not perform 
as well when the clusters are highly entangled, which shows the benefits of the fuzzy approach. 

The proposed algorithms also have several advantages over the generalized Hough 
transform (GHT) methods that have been traditionally used to detect shapes of known 
descriptions. One disadvantage of the GHT approach is that one needs to use a different GHT for 
each type of curve. For example, one needs a GHT for circles, another for ellipses, and yet another 
for parabolas. Although one could devise a GHT that can cover all types of second-degree curves 


(or hyperquadrics), the dimensionality of the resulting parameter space (six in the case of second- 
degree curves in 2-D) will be very large, and the resulting GHT would be computationally very 
expensive. The memory requirements can also be prohibitive!; 12], The speed of the GHT can be 
improved only if we make certain assumptions about the curve, (for example, if the curve is an 
ellipse etc) and if the gradient information is available. Also, our algorithms work well even when 
the edge points are somewhat scattered around the ideal curve (or hypersurface), which causes bin 
splitting in die GHT. Our algorithms can locate small shell segments much better. Small peaks in 
the GHT are lost in the bias[13], and selecting a suitable threshold is difficult. 
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